Efficient Explorative Key-Term Selection Strategies for Conversational Contextual Bandits

نویسندگان

چکیده

Conversational contextual bandits elicit user preferences by occasionally querying for explicit feedback on key-terms to accelerate learning. However, there are aspects of existing approaches which limit their performance. First, information gained from key-term-level conversations and arm-level recommendations is not appropriately incorporated speed up Second, it important ask explorative quickly the user's potential interests in various domains convergence preference estimation, has never been considered works. To tackle these issues, we first propose ``ConLinUCB", a general framework conversational with better incorporation, combining estimate one step at each time. Based this framework, further design two bandit algorithms key-term selection strategies, ConLinUCB-BS ConLinUCB-MCR. We prove tighter regret upper bounds our proposed algorithms. Particularly, achieves bound than previous result. Extensive experiments synthetic real-world data show significant advantages learning accuracy (up 54% improvement) computational efficiency 72% improvement), compared classic ConUCB algorithm, showing benefit recommender systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Optimal Learning for Contextual Bandits

We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal regret. Our algorithm uses a cost sensitive classification learner as an oracle and has a running time polylog(N), where N is the number of classification rules a...

متن کامل

Efficient Contextual Bandits in Non-stationary Worlds

Most contextual bandit algorithms minimize regret to the best fixed policy–a questionable benchmark for non-stationary environments ubiquitous in applications. In this work, we obtain efficient contextual bandit algorithms with strong guarantees for alternate notions of regret suited to these non-stationary environments. Two of our algorithms equip existing methods for i.i.d problems with sophi...

متن کامل

A Contextual Bandits Framework for Personalized Learning Action Selection

Recent developments in machine learning have the potential to revolutionize education by providing an optimized, personalized learning experience for each student. We study the problem of selecting the best personalized learning action that each student should take next given their learning history; possible actions could include reading a textbook section, watching a lecture video, interacting...

متن کامل

Sample-efficient Nonstationary Policy Evaluation for Contextual Bandits

We present and prove properties of a new offline policy evaluator for an exploration learning setting which is superior to previous evaluators. In particular, it simultaneously and correctly incorporates techniques from importance weighting, doubly robust evaluation, and nonstationary policy evaluation approaches. In addition, our approach allows generating longer histories by careful control o...

متن کامل

A Time and Space Efficient Algorithm for Contextual Linear Bandits

We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been proposed that achieve O(log T ) regret after T time steps. However, proposed methods either have a computation complexity per iteration that scales linearly with T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i8.26225